runtime monitor
Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach
Salvi, Aniket, Weiss, Gereon, Trapp, Mario
Autonomous systems that rely on Machine Learning (ML) utilize online fault tolerance mechanisms, such as runtime monitors, to detect ML prediction errors and maintain safety during operation. However, the lack of human-interpretable explanations for these errors can hinder the creation of strong assurances about the system's safety and reliability. This paper introduces a novel fuzzy-based monitor tailored for ML perception components. It provides human-interpretable explanations about how different operating conditions affect the reliability of perception components and also functions as a runtime safety monitor. We evaluated our proposed monitor using naturalistic driving datasets as part of an automated driving case study. The interpretability of the monitor was evaluated and we identified a set of operating conditions in which the perception component performs reliably. Additionally, we created an assurance case that links unit-level evidence of \textit{correct} ML operation to system-level \textit{safety}. The benchmarking demonstrated that our monitor achieved a better increase in safety (i.e., absence of hazardous situations) while maintaining availability (i.e., ability to perform the mission) compared to state-of-the-art runtime ML monitors in the evaluated dataset.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Finland > Southwest Finland > Turku (0.04)
- Automobiles & Trucks (1.00)
- Transportation > Ground > Road (0.85)
- Information Technology > Robotics & Automation (0.71)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.71)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Digital Twin Enabled Runtime Verification for Autonomous Mobile Robots under Uncertainty
Betzer, Joakim Schack, Boudjadar, Jalil, Frasheri, Mirgita, Talasila, Prasad
As autonomous robots increasingly navigate complex and unpredictable environments, ensuring their reliable behavior under uncertainty becomes a critical challenge. This paper introduces a digital twin-based runtime verification for an autonomous mobile robot to mitigate the impact posed by uncertainty in the deployment environment. The safety and performance properties are specified and synthesized as runtime monitors using TeSSLa. The integration of the executable digital twin, via the MQTT protocol, enables continuous monitoring and validation of the robot's behavior in real-time. We explore the sources of uncertainties, including sensor noise and environment variations, and analyze their impact on the robot safety and performance. Equipped with high computation resources, the cloud-located digital twin serves as a watch-dog model to estimate the actual state, check the consistency of the robot's actuations and intervene to override such actuations if a safety or performance property is about to be violated. The experimental analysis demonstrated high efficiency of the proposed approach in ensuring the reliability and robustness of the autonomous robot behavior in uncertain environments and securing high alignment between the actual and expected speeds where the difference is reduced by up to 41\% compared to the default robot navigation control.
- Europe > Switzerland (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (2 more...)
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Agia, Christopher, Sinha, Rohan, Yang, Jingyun, Cao, Zi-ang, Antonova, Rika, Pavone, Marco, Bohg, Jeannette
Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. Thus, algorithms that monitor learned policies at test time and provide early warnings of failure are necessary to facilitate scalable deployment. We propose Sentinel, a runtime monitoring framework that splits the detection of failures into two complementary categories: 1) Erratic failures, which we detect using statistical measures of temporal action consistency, and 2) task progression failures, where we use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. Our approach has two key strengths. First, because learned policies exhibit diverse failure modes, combining complementary detectors leads to significantly higher accuracy at failure detection. Second, using a statistical temporal action consistency measure ensures that we quickly detect when multimodal, generative policies exhibit erratic behavior at negligible computational cost. In contrast, we only use VLMs to detect failure modes that are less time-sensitive. We demonstrate our approach in the context of diffusion policies trained on robotic mobile manipulation domains in both simulation and the real world. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone and significantly outperforms baselines, thus highlighting the importance of assigning specialized detectors to complementary categories of failure. Qualitative results are made available at https://sites.google.com/stanford.edu/sentinel.
- North America > United States > California > Santa Clara County > Palo Alto (0.24)
- North America > United States > New York > New York County > New York City (0.14)
- Asia > South Korea > Daegu > Daegu (0.04)
- (8 more...)
Diagnostic Runtime Monitoring with Martingales
Hindy, Ali, Luo, Rachel, Banerjee, Somrita, Kuck, Jonathan, Schmerling, Edward, Pavone, Marco
Machine learning systems deployed in safety-critical robotics settings must be robust to distribution shifts. However, system designers must understand the cause of a distribution shift in order to implement the appropriate intervention or mitigation strategy and prevent system failure. In this paper, we present a novel framework for diagnosing distribution shifts in a streaming fashion by deploying multiple stochastic martingales simultaneously. We show that knowledge of the underlying cause of a distribution shift can lead to proper interventions over the lifecycle of a deployed system. Our experimental framework can easily be adapted to different types of distribution shifts, models, and datasets. We find that our method outperforms existing work on diagnosing distribution shifts in terms of speed, accuracy, and flexibility, and validate the efficiency of our model in both simulated and live hardware settings.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > San Mateo County > Redwood City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.47)
Closing the Loop on Runtime Monitors with Fallback-Safe MPC
Sinha, Rohan, Schmerling, Edward, Pavone, Marco
When we rely on deep-learned models for robotic perception, we must recognize that these models may behave unreliably on inputs dissimilar from the training data, compromising the closed-loop system's safety. This raises fundamental questions on how we can assess confidence in perception systems and to what extent we can take safety-preserving actions when external environmental changes degrade our perception model's performance. Therefore, we present a framework to certify the safety of a perception-enabled system deployed in novel contexts. To do so, we leverage robust model predictive control (MPC) to control the system using the perception estimates while maintaining the feasibility of a safety-preserving fallback plan that does not rely on the perception system. In addition, we calibrate a runtime monitor using recently proposed conformal prediction techniques to certifiably detect when the perception system degrades beyond the tolerance of the MPC controller, resulting in an end-to-end safety assurance. We show that this control framework and calibration technique allows us to certify the system's safety with orders of magnitudes fewer samples than required to retrain the perception network when we deploy in a novel context on a photo-realistic aircraft taxiing simulator. Furthermore, we illustrate the safety-preserving behavior of the MPC on simulated examples of a quadrotor. We open-source our simulation platform and provide videos of our results at our project page: https://tinyurl.com/fallback-safe-mpc.
Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models
Marco, Alonso, Morley, Elias, Tomlin, Claire J.
In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems
Bensalem, Saddek, Cheng, Chih-Hong, Huang, Wei, Huang, Xiaowei, Wu, Changshun, Zhao, Xingyu
Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Asia > India > Maharashtra > Pune (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)
- Research Report (0.64)
- Overview (0.46)
An investigation of challenges encountered when specifying training data and runtime monitors for safety critical ML applications
Heyn, Hans-Martin, Knauss, Eric, Malleswaran, Iswarya, Dinakaran, Shruthi
Context and motivation: The development and operation of critical software that contains machine learning (ML) models requires diligence and established processes. Especially the training data used during the development of ML models have major influences on the later behaviour of the system. Runtime monitors are used to provide guarantees for that behaviour. Question / problem: We see major uncertainty in how to specify training data and runtime monitoring for critical ML models and by this specifying the final functionality of the system. In this interview-based study we investigate the underlying challenges for these difficulties. Principal ideas/results: Based on ten interviews with practitioners who develop ML models for critical applications in the automotive and telecommunication sector, we identified 17 underlying challenges in 6 challenge groups that relate to the challenge of specifying training data and runtime monitoring. Contribution: The article provides a list of the identified underlying challenges related to the difficulties practitioners experience when specifying training data and runtime monitoring for ML models. Furthermore, interconnection between the challenges were found and based on these connections recommendation proposed to overcome the root causes for the challenges.
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- North America > United States (0.04)
- Personal > Interview (0.66)
- Research Report > New Finding (0.47)
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
- Telecommunications (0.68)
- Law (0.68)
Out-Of-Distribution Detection Is Not All You Need
Guérin, Joris, Delmas, Kevin, Ferreira, Raul Sena, Guiochet, Jérémie
The usage of deep neural networks in safety-critical systems is limited by our ability to guarantee their correct behavior. Runtime monitors are components aiming to identify unsafe predictions and discard them before they can lead to catastrophic consequences. Several recent works on runtime monitoring have focused on out-of-distribution (OOD) detection, i.e., identifying inputs that are different from the training data. In this work, we argue that OOD detection is not a well-suited framework to design efficient runtime monitors and that it is more relevant to evaluate monitors based on their ability to discard incorrect predictions. We call this setting out-ofmodel-scope detection and discuss the conceptual differences with OOD. We also conduct extensive experiments on popular datasets from the literature to show that studying monitors in the OOD setting can be misleading: 1. very good OOD results can give a false impression of safety, 2. comparison under the OOD setting does not allow identifying the best monitor to detect errors. Finally, we also show that removing erroneous training data samples helps to train better monitors.
Unifying Evaluation of Machine Learning Safety Monitors
Guerin, Joris, Ferreira, Raul Sena, Delmas, Kevin, Guiochet, Jérémie
With the increasing use of Machine Learning (ML) in critical autonomous systems, runtime monitors have been developed to detect prediction errors and keep the system in a safe state during operations. Monitors have been proposed for different applications involving diverse perception tasks and ML models, and specific evaluation procedures and metrics are used for different contexts. This paper introduces three unified safety-oriented metrics, representing the safety benefits of the monitor (Safety Gain), the remaining safety gaps after using it (Residual Hazard), and its negative impact on the system's performance (Availability Cost). To compute these metrics, one requires to define two return functions, representing how a given ML prediction will impact expected future rewards and hazards. Three use-cases (classification, drone landing, and autonomous driving) are used to demonstrate how metrics from the literature can be expressed in terms of the proposed metrics. Experimental results on these examples show how different evaluation choices impact the perceived performance of a monitor. As our formalism requires us to formulate explicit safety assumptions, it allows us to ensure that the evaluation conducted matches the high-level system requirements.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > United Kingdom > England > North Yorkshire > York (0.04)
- Automobiles & Trucks (1.00)
- Transportation > Ground > Road (0.88)
- Information Technology > Robotics & Automation (0.88)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)